modulo image
UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
A conventional camera often suffers from over-or under-exposure when recording a real-world scene with a very high dynamic range (HDR). In contrast, a modulo camera with a Markov random field (MRF) based unwrapping algorithm can theoretically accomplish unbounded dynamic range but shows degenerate performances when there are modulus-intensity ambiguity, strong local contrast, and color misalignment. In this paper, we reformulate the modulo image unwrapping problem into a series of binary labeling problems and propose a modulo edge-aware model, named as UnModNet, to iteratively estimate the binary rollover masks of the modulo image for unwrapping. Experimental results show that our approach can generate 12-bit HDR images from 8-bit modulo images reliably, and runs much faster than the previous MRF-based algorithm thanks to the GPU acceleration.
- Asia (0.05)
- North America > United States > Massachusetts (0.04)
- North America > Canada (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- (2 more...)
Modulo Video Recovery via Selective Spatiotemporal Vision Transformer
Geng, Tianyu, Ji, Feng, Tay, Wee Peng
Conventional image sensors have limited dynamic range, causing saturation in high-dynamic-range (HDR) scenes. Modulo cameras address this by folding incident irradiance into a bounded range, yet require specialized unwrapping algorithms to reconstruct the underlying signal. Unlike HDR recovery, which extends dynamic range from conventional sampling, modulo recovery restores actual values from folded samples. Despite being introduced over a decade ago, progress in modulo image recovery has been slow, especially in the use of modern deep learning techniques. In this work, we demonstrate that standard HDR methods are unsuitable for modulo recovery. Transformers, however, can capture global dependencies and spatial-temporal relationships crucial for resolving folded video frames. Still, adapting existing Transformer architectures for modulo recovery demands novel techniques. To this end, we present Selective Spatiotemporal Vision Transformer (SSViT), the first deep learning framework for modulo video reconstruction. SSViT employs a token selection strategy to improve efficiency and concentrate on the most critical regions. Experiments confirm that SSViT produces high-quality reconstructions from 8-bit folded videos and achieves state-of-the-art performance in modulo video recovery.
- North America > United States (0.14)
- Asia > Singapore (0.04)
Supplementary Material UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging Chu Zhou 1 Hang Zhao 2 Jin Han 1 Chang Xu
We could apply a binary search to achieve this, as shown in Algorithm 1 below. The formation of a spike can be expressed as an "accumulate-fire-reset" cycle: The This signal also resets the corresponding accumulator, in which all the electric charges are drained ( i.e ., resets Specifically, the sensor checks the accumulators periodically within a fixed interval.
- Asia (0.05)
- North America > United States > Massachusetts (0.04)
- North America > Canada (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada (0.04)
- Asia > Southeast Asia (0.04)
Review for NeurIPS paper: UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
Weaknesses: My primary concern with this paper is that the problem it is addressing is *extremely* niche --- Modulo cameras are a somewhat obscure problem even within the realm of the computational imaging community. If I was reviewing this paper for a computational imaging/photography conference, I would be more charitable towards this paper. But this subject is unlikely to be of interest to the general NeurIPS audience, and this paper seems unlikely to reach its intended audience if presented at NeurIPS. And the specifics of this neural network architecture are so specifically tailored to this particular problem that I'm not sure what a general ML researcher could come away from this paper with, nor am I convinced that this is a problem that should be popularized with ML researchers as, again, a solution to this problem has limited practical value given that modulo cameras are still a largely hypothetical concept. My other concern with this paper (which would be a significant concern even if I were reviewing this paper in a computational imaging conference) is that the baseline evaluation is misleading.
Review for NeurIPS paper: UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
The submission has received two positive and two negative reviews. The post-rebuttal discussion has not lead to convergence, and the opinion of the reviewers remain split. The concerns of the "negative" reviewers are: 1) The application is too niche (R1). However, the topic of the paper falls into NeurIPS call for papers, as it is related to low-level computer vision, compressed sensing, deep neural architectures. The authors rebut that the results in [55] were cherry-picked and that they use the code from [55], while fixing the parameters.